Music is an essential part of many people’s lives, and with digital music streaming platforms like Spotify, discovering new music has become more accessible than ever. Spotify’s algorithm suggests playlists based on users’ listening habits and preferences. One of the most popular playlists on Spotify is “New Music Friday,” updated every week with new releases from various artists. It is curated by Spotify’s editorial team, who select the latest and most popular releases from various genres. However, the question remains: how accurately does this playlist align with personal musical taste and style?
This final portfolio explores the inner workings of Spotify’s music recommendation algorithm, attempting to create a predictive model that accurately predicts a user’s preferred musical genres and styles. The analysis will focus on two playlists - “Songs I Like” and “Songs I Dislike” - to identify patterns in personal musical taste. The “Songs I Like” playlist is a collection of songs that hold personal meaning, representing various genres such as pop, 70s, 80s, 90s musical, and rock. In contrast, the “Songs I Dislike” playlist consists of heavy metal and drill rap songs that do not match personal taste.
By analyzing the features of songs in both playlists, this portfolio aims to provide insights into personal musical preferences and use this information to train a machine learning model. The goal is to create a predictive model that accurately predicts preferred musical genres and styles. Additionally, the analysis will investigate whether this model can predict which songs from popular playlists like “New Music Friday” would resonate with the user, allowing for a more personalized and tailored listening experience.
In summary, this portfolio tries to offer an understanding of Spotify’s music recommendation algorithm and the potential for using machine learning to predict a user’s preferred musical genres and styles.
iframe src=“https://open.spotify.com/embed/playlist/6pl0C7qbIl5uoY3Tdf82oa?utm_source=generator” width=“100%” height=“100%” frameBorder=“0” allowfullscreen=“” allow=“autoplay; clipboard-write; encrypted-media; fullscreen; picture-in-picture”>
iframe src=“https://open.spotify.com/embed/playlist/4bJQX5w7W4wEnHLmWqUIVY?utm_source=generator&theme=0” width=“100%” height=“100%” frameBorder=“0” allowfullscreen=“” allow=“autoplay; clipboard-write; encrypted-media; fullscreen; picture-in-picture”>

Based from this scatterplot it seems like I tend to like songs that are loud, with a range of loudness values represented in the plot. I also seem to be drawn to songs that are high-energy, as they’re clustered towards the upper-right portion of the plot where both the energy and loudness values are high.
As for the valence of the songs I like, it appears to be centered around 0.5, but mostly falls between 0.5 and 1.0. This suggests to me that I tend to prefer songs with a positive emotional valence, which could contribute to their appeal.
One interesting thing I noticed in the plot is that the songs I like are equally distributed between major and minor keys, as indicated by the color coding in the plot. This suggests to me that the mode of the songs I like doesn’t strongly influence my preference for them.
Overall, this plot provides insight into the features that make songs likable to me, highlighting the importance of loudness, energy, valence, and mode in my musical taste.

Based on the scatterplot of songs I dislike, it seems that they are generally less loud compared to the songs I like. While they still tend to have high energy, they often have a lower valence. This suggests to me that I may prefer songs that are more uplifting or positive in emotional tone, rather than those that are more somber or negative.
Interestingly, like the songs I like, the songs I dislike are similarly distributed between major and minor keys, as indicated by the color coding in the plot. This seems to suggest that the mode of a song is not a major factor in whether or not I like it.
Overall, this scatterplot provides insight into the musical features that I find unappealing in songs, highlighting the importance of loudness, energy, and valence in my musical taste.
Songs I like seems to have more valance than songs I dislike.
Songs I like seem to have an energy between 0.7 and 0.9 while the songs I dislike seems to have an energy around 0.95.
Songs I like seems to be louder than song I dislike.
The selection of an appropriate corpus is essential to effectively achieve research objectives. In this study, a broad corpus was chosen that included diverse data related to the research question. However, the large volume of data made it difficult to identify significant patterns or trends that could adequately address the research question.
To overcome this challenge, the decision was made to focus on the outliers in the corpus. Specifically, the analysis focused on the extreme cases that were most divergent from the norm in terms of a specific timbre component. This approach allowed for the isolation and study of the outliers, leading to valuable insights and a better understanding of the factors contributing to their unique timbre characteristics. Ultimately, this approach strengthened the analysis and enhanced the quality of the research findings.
Timbre, the quality of sound that distinguishes different musical instruments, is a crucial aspect of music. The analysis of timbre is often performed using spectral content, which measures the relative strengths of various frequency components that make up the sound. In this study, the focus was on a specific timbre component, which was used to isolate and study the outliers in the corpus.
By examining the c2 tab, it can be observed that there are four outliers, with two in each playlist. Specifically, in the ‘Songs I like’ playlist, the song ‘What I like About You’ exhibits the highest timbre in the c02 vector, whereas ‘The Sailor’s Warning’ has the lowest. In contrast, in the ‘Songs I dislike’ playlist, ‘Murder’ has the highest timbre in the c02 vector, while ‘19 Tini 5’ has the lowest.
| Playlist | Track | Artists | value |
|---|---|---|---|
| Songs I dislike | Street Sense | Shawty Pimp, MC Spade | 37.45836 |
| Songs I like | Africa | TOTO | 36.25328 |
| Songs I dislike | 1984 | Slaughter to Prevail | 55.58922 |
| Songs I like | Starstruck | Years & Years | 54.76817 |
| Playlist | Track | Artists | value |
|---|---|---|---|
| Songs I dislike | 19 Tini 5 | TiniMaine | -81.27966 |
| Songs I like | The Sailor’s Warning | Faela | -34.36908 |
| Songs I dislike | Murder (feat. Tom Skeemask & GK) | DJ Squeeky, Tom Skeemask, GK | 145.18006 |
| Songs I like | What I Like About You | The Romantics | 109.29211 |
| Playlist | Track | Artists | value |
|---|---|---|---|
| Songs I dislike | Street Sense | Shawty Pimp, MC Spade | -109.32972 |
| Songs I like | Think About Things | Daði Freyr | -44.19746 |
| Songs I dislike | Wounds | ColdWorld | 59.98358 |
| Songs I like | golden hour | JVKE | 69.93170 |
| Playlist | Track | Artists | value |
|---|---|---|---|
| Songs I dislike | 2 Thick | DJ Zirk, Tha 2thick Family, Tom Skee, BuckShotz | -40.84057 |
| Songs I like | The Sailor’s Warning | Faela | -25.60552 |
| Songs I dislike | LIVING LEGEND | Scarlxrd | 30.74823 |
| Songs I like | The Way You Make Me Feel - 2012 Remaster | Michael Jackson | 29.41128 |
A chromagram is a visual representation of the distribution of pitches in a musical recording. Comparing the chromagrams of two songs, “State of Unrest” from the playlist of disliked songs and “Shut up and Dance” from the playlist of liked songs, reveals interesting differences in their pitch distribution. In “State of Unrest,” there is a strong concentration of pitches around the D chord, indicating a relatively stable harmonic structure. On the other hand, “Shut up and Dance” shows more variation in pitch distribution, with a wider range of pitches around the F#, G#, and A# chords. This suggests that “Shut up and Dance” has a more complex harmonic structure with more varied chord progressions. These differences in pitch distribution could contribute to the overall appeal of the songs and could be further explored in future analyses.
A ceptrogram is a visual representation of the distribution of timbral characteristics in a musical recording. Comparing the ceptrograms of two songs, “State of Unrest” from the playlist of disliked songs and “Shut up and Dance” from the playlist of liked songs, reveals interesting differences in their timbral distribution. In “State of Unrest,” there is a strong concentration of timbral characteristics around a certain range, indicating a relatively stable sonic texture. On the other hand, “Shut up and Dance” shows more variation in timbral distribution, with a wider range of timbral characteristics. This suggests that “Shut up and Dance” has a more complex and varied sound texture. These differences in timbral distribution could contribute to the overall appeal of the songs and could be further explored in future analyses.
A self-similarity matrix is a visual representation of the similarity between different sections of a musical recording. Comparing the self-similarity matrices of four songs, “State of Unrest” in both timbre and chroma and “Shut up and Dance” in both timbre and chroma, reveals interesting differences in their internal structure.
In the self-similarity matrix for the timbral characteristics of “State of Unrest,” there are clearly defined blocks, indicating repeated patterns in the sound. In contrast, the timbral self-similarity matrix for “Shut up and Dance” shows a more continuous, fluid structure, suggesting a more unpredictable and dynamic sound.
Similarly, the chroma self-similarity matrix for “State of Unrest” shows a highly repetitive structure with clear diagonal lines, indicating the presence of repeated chord progressions. On the other hand, the chroma self-similarity matrix for “Shut up and Dance” shows a more dispersed and varied pattern, indicating a more diverse harmonic structure.
These differences in internal structure could contribute to the overall appeal of the songs and provide insights into the musical composition and arrangement. Further analyses could explore the relationship between these structural characteristics and the emotional and perceptual responses to the music.
WIP
For the Tempograms, I selected the two songs from the ‘Songs I like’ playlist with the highest and lowest c01 components based on the track-level-summary (see track-level-summary). Specifically, I chose ‘Africa’ by Toto with the highest c01 value, and ‘Starstruck’ by Years & Years with the lowest c01 value. This selection was made to investigate potential differences in tempo between the songs.
From the plots, it can be observed that both songs have a relatively steady tempo with occasional variations. However, ‘Starstruck’ exhibits more frequent tempo changes, which can be explained by its lower c01 timbre component. The c01 component measures the overall loudness of the song.
In conclusion, the selection of songs based on their c01 component values allowed for an exploration of potential differences in tempo patterns and provided insights into the relationship between c01 values and overall loudness. Additionally, it is worth noting that I tend to prefer songs with a consistent tempo, which may explain my personal preference for songs with similar overall tempos.
WIP
Based on the insights gained from the feature analysis, it appears that there are clear patterns in the musical features that I find appealing and unappealing in songs. This raises the possibility of developing a classification model to predict whether or not I would like a particular song based on its acoustic features.
For example, a decision tree model could be trained on a dataset of songs that I have rated as either liked or disliked, using features such as loudness, energy, valence, and mode as predictors. The resulting model could then be used to predict the likelihood of me liking a new song based on its acoustic features.
While the plots provide some valuable insights into the musical features that I find appealing or unappealing, it’s important to note that these are just a few of the many features that could potentially influence my musical taste. A more accurate classification model would need to take into account a broader range of features, such as tempo, rhythm, instrumentation, and genre, among others. Additionally, the model would need to be trained on a larger and more diverse set of songs to ensure that it can accurately classify songs that I like or dislike across a wider range of styles and genres. Nevertheless, the insights gained from these plots provide a good starting point for developing a more comprehensive model of my musical taste.